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Structural genomics of the SARS coronavirus: 


cloning, expression, crystallization and preliminary 


crystallographic study of the Nsp9 protein 


The aetiologic agent of the recent epidemics of Severe Acute 
Respiratory Syndrome (SARS) is a positive-stranded RNA 
virus (SARS-CoV) belonging to the Coronaviridae family and 
its genome differs substantially from those of other known 
coronaviruses. SARS-CoV is transmissible mainly by the 
respiratory route and to date there is no vaccine and no 
prophylactic or therapeutic treatments against this agent. A 
SARS-CoV whole-genome approach has been developed 
aimed at determining the crystal structure of all of its proteins 
or domains. These studies are expected to greatly facilitate 
drug design. The genomes of coronaviruses are between 27 
and 31.5 kbp in length, the largest of the known RNA viruses, 
and encode 20-30 mature proteins. The functions of many of 
these polypeptides, including the Nsp9-Nsp10 replicase- 
cleavage products, are still unknown. Here, the cloning, 
Escherichia coli expression, purification and crystallization of 
the SARS-CoV Nsp9 protein, the first SARS-CoV protein to 
be crystallized, are reported. Nsp9 crystals diffract to 2.8 A 
resolution and belong to space group P6,/522, with unit-cell 
parameters a = b = 89.7, c = 136.7 A. With two molecules 
in the asymmetric unit, the solvent content is 60% 
(Vu = 3.1 A? Da”). 


1. Introduction 


The recent epidemics of Severe Acute Respiratory Syndrome 
(SARS) represent a real paradigm for emerging viral patho- 
gens, as well as an example of worldwide coordinated efforts 
to control a serious viral outbreak, a test of the reaction time 
of the scientific community. The first cases of Severe Acute 
Respiratory Syndrome originated from the Guangdong 
province in South East China. The number of cases reported 
and our current knowledge regarding this illness are still 
currently evolving, but a number of basic facts have been 
firmly established. The aetiologic agent of SARS is a positive- 
stranded RNA virus belonging to the Coronaviridae family 
and its genome differs substantially from those of previously 
identified coronaviruses, including two other human corona- 
viruses (Peiris et al., 2003; Ksiazek et al., 2003; Drosten et al., 
2003; Snijder et al., 2003). The virus, whose name SARS-CoV 
is now currently accepted, is mainly transmitted by the 
respiratory route. However, evidence for a secondary faeco— 
oral route of transmission has also been presented. The viral 
strain probably primarily infected wild animals traded in 
Asian markets and crossed the species barrier to infect 
humans. 

There is to date no vaccine and no prophylactic or ther- 
apeutic treatments against this agent. A prophylactic 
treatment would have been useful to combat the epidemics; 
the only effective measure available to prevent the spread of 
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the virus is to quarantine all persons that have been exposed 
to SARS-CoV. The number of antiviral molecules that can be 
used to treat patients infected by RNA viruses is incredibly 
low. Accordingly, it is important to search for efficient antiviral 
drugs for a large number of RNA viruses, while giving priority 
to viruses transmitted by the respiratory route because they 
have the highest potential for causing pandemic outbreaks. 

The scientific community has reacted promptly and effi- 
ciently to identify and characterize this new infectious agent, 
as well as to develop methods for SARS-CoV detection and 
containment protocols. In the meantime, a wide effort is being 
made to design drugs active against SARS-CoV. Ribavirin has 
been used in the absence of other candidates, but its intrinsic 
efficiency against SARS-CoV appears to be low (Koren et al., 
2003). 

To select drugs active against a viral pathogen, one usually 
relies on screening candidate drugs for their efficacy in virus- 
infected cell cultures and/or animal models. However, during 
the current research on drugs for treating hepatitis C virus 
(HCV) infections, a novel and promising approach has been 
introduced. The RNA-dependent RNA polymerase of HCV 
has been purified and crystallized and enzymatic tests have 
been used to find potent nucleoside and non-nucleoside 
inhibitors of the virus, the structure—activity relationships of 
which allow further testing and clinical developments (de 
Francesco et al., 2003). This approach is gaining momentum 
owing to a concomitant increase in the power of new tech- 
nologies and technological developments. Among those, 
genomics approaches are being conducted to solve the crystal 
structures of large sets of clinically relevant proteins, which 
will become the subjects of future structure—function rela- 
tionship studies. 

A crystal structure has not yet been determined for any of 
the 28 predicted mature SARS-CoV proteins. The crystal 
structure of the main (or 3CL) protease of transmissible 
gastroenteritis virus, a related coronavirus, has been deter- 
mined and was used to construct a model of the SARS-CoV 
3CL protease, facilitating future drug design against this 
important target (Anand et al., 2003). The putative corona- 
virus RNA-dependent RNA polymerase has been purified, 
but is inactive in vitro (Grotzinger et al., 1996). 

In this context, we have developed a SARS-CoV whole- 
genome approach aimed at determining the crystal structure 
of all SARS-CoV proteins. We anticipate that this will greatly 
facilitate drug design as well as the study of many other 
aspects related to the biology of these complex viruses. 


Coronaviruses are enveloped viruses with a single-stranded 
RNA genome of positive polarity (Lai & Holmes, 2001). Their 
genome is between 27 and 31.5 kbp in length, the largest of the 
known RNA viruses. Like other coronaviruses, the SARS- 
CoV genome is known to encode two large replicase poly- 
proteins (the ORFla and ORFlab proteins), which are 
processed into a set of mature non-structural proteins (Nsps) 
by internal viral proteases (Snijder et al., 2003). The functions 
of many of these products, such as the Nsp9-Nsp10 poly- 
peptides produced from the C-terminal domain of the ORF 1a- 
encoded polyprotein, are still unknown. In the related mouse 
hepatitis virus, which is a group 2 coronavirus, the SARS-CoV 
Nsp9 corresponds to a 12 kDa cleavage product (Pla-12) that 
is found preferentially in the perinuclear region of infected 
cells, where it co-localizes with other components of the viral 
replication complex (Bost et al., 2000). No clues to the func- 
tion of the Nsp9 equivalent of any coronavirus have been 
obtained thus far. Here, we report the cloning, expression, 
purification and crystallization of the SARS-CoV Nsp9 
protein, a 113-residue protein (Fig. 1), which is the first SARS- 
CoV protein to be crystallized. 


2. Material and methods 
2.1. Infection and RNA isolation 


Vero cells were infected with SARS-CoV (Frankfurt-1 
strain; NCBI Accession No. AY291315; Drosten et al., 2003) at 
a multiplicity of infection of 0.01. At the onset of the cyto- 
pathogenic effect (approximately 40h post-infection), intra- 
cellular RNA was isolated by cell lysis for 10 min at room 
temperature with 5% lithium dodecyl sulfate in LET buffer 
(100mM LiCl, 1mM EDTA, 10mM Tris-HCl pH 7.4) 
containing 20 pg ml~' of proteinase K. After shearing of the 
cellular DNA using a syringe, lysates were incubated at 315 K 
for 15 min, extracted with phenol (pH 4.0) and chloroform and 
the RNA was ethanol-precipitated. cDNA was obtained by 
reverse transcription using primer SAV009 (5’-GGACAG- 
CAACCGCTGGACAATC-3’), complementary to nucleo- 
tides 13644-13665 of the Frankfurt-1 genome, using 
Thermoscript reverse transcriptase (Invitrogen). 


2.2. Subcloning, Escherichia coli protein expression and 
purification 


The SARS-CoV Nsp9-coding sequence was amplified by 
PCR from the cDNA prepared above using two primers 
containing the attB sites of the Gateway 

recombination system (Invitrogen). At 


z 10 20 30 40 50 

. ri rj ° rs rf , 
sars [EL SVAMROMS(CAAGITITO TAC TD DINALAY YNNSKGGRIF V LINLIL S DIBJODEAWARIF Pik the 5’ end of the gene, a sequence 
BCV ABSBYL MEQAIKERK TIQVIVIN|S G|PID Q T|CIN TIP TIQIC .|. Y YNNS|NINGIKIT V YENTIL S DIV|D GBA Y|TIKIT LIK encoding a hexahistidine tag was 
Tov Weed Mqcikhak EIR a\viRia Slalr LGB alr cis|c KIA LMASEISICK|S|F M YINFIL A SIDIN NERAYIVIEIW . |. 

60 70 80 90 100 110 attached. The cDNA _ was _ then 
SARS SDGTIGiT lly TERRE PEJC REQVIT DIT PIKe)P K Vipeebag hdc LIUNEAN SMINAT VRLQ subcloned in the pDestl4 plasmid 
BCV DDGNIFIVVIL . RMD PIC KIS TIV OD VIK(EL K Ikea aa VidG CRY TRA MST VRLO : : 
TGV ESNNID|I1|P IPAE AIL RigYiVDIGAWN> E Vike eep ea VIAN LAITIOR MmeATVRLO (Invitrogen). The open reading frame 
. of the final construct (referred to as 
Figure 1 


Alignment of the sequence of the SARS-CoV Nsp9 protein with that of bovine coronavirus (BCV) 
and of the transmissible gastroenterisis virus (TGV). Conserved residues are identified with a black 


background. Homologous residues are boxed. 


pDest14/Nsp9-HN and encoding an 
N-terminally His-tagged version of 
SARS-CoV orfla polyprotein residues 
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4118-4230) was checked by sequencing (MilleGen, Toulouse, 
France). Expression was performed in E. coli strain C41(DE3) 
(Avidis SA, France) transformed with the pLysS plasmid 
(Novagen). This plasmid carries the lysozyme gene, allowing 
tight regulation of the expression, and supplies the tRNAs for 
six rare codons used with a very low frequency in E. coli. 
Cultures were grown at 310K until ODgo9 reached 0.6 and 
were then stored for 2 h on ice; 2% ethanol was added for the 
induction of stress chaperones (Gong & Shuman, 2002). 
Expression was induced by adding 50 uM IPTG and cells were 
incubated for 16 h at 290 K. Cells were collected by centrifu- 
gation and the bacterial pellets were resuspended and frozen 
in 50 mM Tris-HCl, 150 mM NaCl, 10 mM imidazole pH 8.0. 

Cellular suspensions were thawed with 0.25 mg ml! lyso- 
zyme, 0.1 1g ml~' DNase and 20mM MgSO, and were 
centrifuged at 12 000g. The supernatant was applied onto an 
Ni-affinity column connected to an FPLC system (Amersham 
Pharmacia Biotech). The protein was eluted with 50 mM Tris— 
HCl, 150mM NaCl, 250mM imidazole pH 8.0 and then 
applied onto a preparative Superdex 200 gel-filtration column 
pre-equilibrated in 10 mM Tris-HCl, 300 mM NaCl pH 8.0. 
The recombinant protein was characterized by N-terminal 
sequencing, mass spectroscopy, dynamic light scattering 
(DLS) and circular dichroism (CD). 


2.3. Protein characterization 


DLS was performed with a Dynapro Microsampler (Protein 
Solutions) using a protein solution at 5.8 mg ml~' in 10 mM 
Tris-HCl, 300 mM NaCl pH 8.0. The CD spectrum of the final 
purified product was recorded between 185 and 260 nm on a 
JASCO J810 spectrometer using a protein solution at 
0.1 mg ml~' in sodium phosphate buffer pH 7.0 containing 
25 mM NaCl. 


2.4. Crystallization 


Crystallization screening was performed by vapour diffu- 
sion with nanodrops using a Cartesian robot as described 
previously (Sulzenbacher et al., 2002; Vincentelli et al., 2003). 
Briefly, three commercial kits were used: Wizard Screens 1 and 


Figure 2 
Optimized crystals of the SARS-CoV Nsp9 protein. The scale bar is 
100 um. 


Table 1 
Crystal parameters and data-reduction statistics of the Nsp9 protein 
crystals. 


Values in parentheses are for the last resolution shell. 


Space group ; 
Unit-cell parameters (A) 


P64/522 
a= b = 89.7, c = 136.7 


Beamline ID14-EH1 at ESRF (A = 0.934 A) 
Resolution (A) 26.0-2.8 (2.94-2.8) 

Ryym (%) 5.3 (28.1) 

Ilo(1) 9.9 (2.5) 


No. reflections 

No. unique reflections 
Completeness 
Multiplicity 


90899 (11486) 
8395 (1166) 
98.7 (98.7) 
10.8 (9.9) 


2 (Emerald BioStructures), Structure Screens 1 and 2 and 
Stura Footprint screen (Molecular Dimensions Ltd). The 
crystals were obtained in 2.0M ammonium sulfate, 0.1 M@ 
phosphate-citrate pH 4.2 and with a protein concentration of 
5.8 mg ml’ in the gel-filtration buffer. The optimization of the 
crystallogenesis was performed with nanodrops in a two- 
dimensional matrix (Lartigue et al., 2003) with a precipitant 
range of 1.8-2.2 M ammonium sulfate and a pH range of 
4.0-4.5 (0.1 M phosphate-citrate), leading to a crystal size of 
~100 x 100 x 80 um (Fig. 2). 


2.5. Data collection 


The crystals were cryocooled in a pure solution of silicone 
oil DC200. They were exposed at beamline ID14-EH1, ESRF, 
Grenoble using a Quantum ADSC Q4R detector. A total of 
110 1° oscillations were recorded with a crystal-to-detector 
distance of 180 mm and a collection time of 9s per frame. 
Diffraction data were integrated with DENZO (Otwinowski 
& Minor, 1997) and were reduced with SCALA (Collaborative 
Computational Project, Number 4, 1994). 


3. Results and discussion 
3.1. E. coli protein expression and purification 


We have subcloned 35 SARS-CoV targets in the Gateway 
system, including 20 full-length proteins and 15 protein 
domains. To date, 70 constructs have been generated, of which 
28 were expressed, 14 were soluble and five were purified. 
Four of them led to small crystals, among which were those of 
the Nsp9 protein described in this report. Expression of 
selenomethionine-substituted Nsp9 was performed using 
the method of methionine-biosynthesis pathway inhibition 
(Doublié, 1997). Purification of the selenomethionine protein 
was performed as described above and crystal optimization is 
under way. 


3.2. Data collection and reduction 


Nsp9 crystals diffract to 2.8 A at ID14-EH1 (ESRF, 
Grenoble). Data integration and reduction indicate that they 
belong to the P622 space group. Reym is 5.3%, an excellent 
value considering the redundancy of the data (Table 1). 
Reflections are observed at multiples of six along the c axis 
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(00/), indicating that the space group is either P6,22 or its 
enantiomorph P6522. The unit-cell parameters are a = b = 89.7, 
c = 136.7 A, which lead to a Vy value of 3.1 A? Da~! (60% 
solvent) with two molecules in the asymmetric unit (Matthews, 
1968). The observed distribution of centric or acentric inten- 
sities overlaps with the theoretical curve, an indication that 
merohedral twinning, a feature that is often observed in 
trigonal or hexagonal crystals, is not present. 


3.3. Characterization 


SARS-CoV Nsp9 has been purified to homogeneity in two 
steps. The identity of the final product has been confirmed by 
N-terminal sequencing. The oligomeric status of Nsp9 has 
been checked using gel filtration and DLS. The former tech- 
nique indicates that the protein is monomeric, while the DLS 
analysis is consistent with a monodisperse species with an 
apparent Stokes radius of 26 A and an equivalent mass of 
31 kDa, which corresponds to a dimer. This discrepancy might 
be related to the concentration differences between the two 
techniques. 

A PSI-Blast search retrieved seven homologous sequences, 
all belonging to members of the Coronaviridae family. They 
were aligned using MULTALIGN (Corpet, 1988) with stan- 
dard options. The consensus of the secondary-structure 
predictions obtained with JPRED (Cuff et al, 1998), 
PSI-PRED (McGuffin et al., 2000) and PREDICT PROTEIN 
(Rost, 1996) converges to a fold of seven f-strands. A fold- 
recognition analysis was performed with the threading 
programs 3D-PSSM (Kelley et al., 2000) and INBGU (Fischer, 
2000). Both programs fail to detect any protein homologue to 
Nsp9, but converge to a fold of two seven-stranded f-sheets. 
In agreement, the CD spectrum of purified Nsp9 reveals a 
structured protein formed by a majority of f-strands (35%) 
and $-turns (18%), but which also contains 15% a-helix. 
Random-coil segments account for 32% of the total. 


4. Conclusion 


The SARC-CoV Nsp9 protein expressed in E. coli was readily 
crystallized using the nanodrop screening (Sulzenbacher et al., 
2002) and optimization (Lartigue et al., 2003) approaches. 
Crystals diffract to 2.8 A resolution and are amenable to 
structure determination using SeMet substitution and MAD 
methods (Hendrickson, 1991) at synchrotrons. 
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